【问题标题】:ggplot(): Error in FUN(X[[i]], ...) : object not foundggplot():FUN(X [[i]],...)中的错误:找不到对象
【发布时间】:2018-06-28 15:55:56
【问题描述】:

我希望制作一张图表,比较特朗普与希拉里·克林顿和奥巴马在 R 中的使用情况。为此,我遵循了这个网站的方法: https://www.tidytextmining.com/tidytext.html#word-frequencies

ggplot(frequency, aes(x = proportion, y = `Donald Trump`, color = abs(`Donald Trump` - proportion))) +
  geom_abline(color = "gray40", lty = 2) +
  geom_jitter(alpha = 0.1, size = 2.5, width = 0.3, height = 0.3) +
  geom_text(aes(label = word), check_overlap = TRUE, vjust = 1.5) +
  scale_x_log10(labels = percent_format()) +
  scale_y_log10(labels = percent_format()) +
  scale_color_gradient(limits = c(0, 0.001), low = "darkslategray4", high = "gray75") +
  facet_wrap(~author, ncol = 2) +
  theme(legend.position="none") +
  labs(y = "Donald Trump", x = NULL)

我的数据框如下所示: enter image description here 但是,我不断收到错误

Error in FUN(X[[i]], ...) : object 'Donald Trump' not found

似乎该错误与 ggplot() 有关。但是,我尝试以多种方式改变这一点,但我根本找不到错误。希望你能帮助我 - 提前谢谢!

【问题讨论】:

  • 您的data.frame 中没有Donald Trump 列。

标签: r ggplot2


【解决方案1】:

我认为您只需要再次传播和收集数据,就像您提供的示例中所做的那样。另请注意,reprex 在这里会有所帮助,因此我不必从示例中创建一个,这可能与您无关。

#creating fake data
library(gutenbergr)
library(tidytext)
library(dplyr)
library(janeaustenr)
library(stringi)
library(tidyr)

hgwells <- gutenberg_download(c(35, 36, 5230, 159))

tidy_hgwells <- hgwells %>%
  unnest_tokens(word, text) %>%
  anti_join(stop_words)


bronte <- gutenberg_download(c(1260, 768, 969, 9182, 767))

tidy_bronte <- bronte %>%
  unnest_tokens(word, text) %>%
  anti_join(stop_words)

original_books <- austen_books() %>%
  group_by(book) %>%
  mutate(linenumber = row_number(),
         chapter = cumsum(str_detect(text, regex("^chapter [\\divxlc]",
                                                 ignore_case = TRUE)))) %>%
  ungroup()

tidy_books <- original_books %>%
  unnest_tokens(word, text)

tidy_books <- tidy_books %>%
  anti_join(stop_words)

frequency <- bind_rows(mutate(tidy_bronte, author = "Hillary Clinton"),
                       mutate(tidy_hgwells, author = "Barack Obama"), 
                       mutate(tidy_books, author = "Donald Trump")) %>% 
  mutate(word = str_extract(word, "[a-z']+")) %>%
  count(author, word) %>%
  group_by(author) %>%
  mutate(proportion = n / sum(n)) %>% 
  select(-n) %>% 
  spread(author, proportion) %>% 
  gather(author, proportion, `Hillary Clinton`,`Barack Obama`)

管道代码的最后两行将使用您的数据框。你正在做的是传播你的克林顿和奥巴马的数据,同时保留一列仅对应于特朗普的比例。

这是您的数据框的外观示例:

> head(frequency)
# A tibble: 6 x 4
  word       `Donald Trump` author          proportion
  <chr>               <dbl> <chr>                <dbl>
1 a              0.00000919 Hillary Clinton 0.0000319 
2 aback         NA          Hillary Clinton 0.00000398
3 abaht         NA          Hillary Clinton 0.00000398
4 abandon       NA          Hillary Clinton 0.0000319 
5 abandoned      0.00000460 Hillary Clinton 0.0000916 
6 abandoning    NA          Hillary Clinton 0.00000398

现在可以正常绘制了。

ggplot(frequency, aes(x = proportion, y = `Donald Trump`, color = abs(`Donald Trump` - proportion))) +
  geom_abline(color = "gray40", lty = 2) +
  geom_jitter(alpha = 0.1, size = 2.5, width = 0.3, height = 0.3) +
  geom_text(aes(label = word), check_overlap = TRUE, vjust = 1.5) +
  scale_x_log10(labels = percent_format()) +
  scale_y_log10(labels = percent_format()) +
  scale_color_gradient(limits = c(0, 0.001), low = "darkslategray4", high = "gray75") +
  facet_wrap(~author, ncol = 2) +
  theme(legend.position="none") +
  labs(y = "Donald Trump", x = NULL)

【讨论】: