【问题标题】:Tweets returned by twitteR are shortenedtwitteR 返回的推文被缩短
【发布时间】:2018-04-17 22:07:52
【问题描述】:

我正在使用RtwitteR 包来收集一些推文。但是,我注意到searchTwitter 函数返回的推文文本不是完整的推文文本,而是被删减到正好等于 140 个字符,其余文本被网络上推文的链接替换。

以我找到的一条推文为例:

require(twitteR)
require(ROAuth)

# authorize twitter with consmuer and access key/secret
setup_twitter_oauth(AAA, BBB, CCC, DDD)   # actual secret codes go here...

# get sample tweet
tweet <- searchTwitter("When I was driving around earlier this afternoon I only saw two Hunters",
                       n=500,
                       since = "2017-11-04",
                       until = "2017-11-05",
                       retryOnRateLimit=5000)

# print tweet
tweet[[1]]
[1] "_TooCrazyFox_: When I was driving around earlier this afternoon I only saw two Hunters but it was during the midday break. I didn'… *SHORTENEDURL*"
# the *SHORTENEDURL* is actually a link that brings you to the tweet; stackoverflow didn't want me to a put shortened urls in here

# convert to data frame
df <- twListToDF(tweet)

# output text and ID
df$text
[1] "When I was driving around earlier this afternoon I only saw two Hunters but it was during the midday break. I didn'… *SHORTENEDURL*"

df$id
[1] "926943636641763328"

如果我go to this tweet via my web browser,很明显twitteR 将文本缩短为 140 个字符,并包含指向包含整个文本的推文的链接。

我在twitteR 文档中没有看到任何提及。有没有办法在搜索过程中保留整个推文文本?

我的假设是这与此处引用的 Twitter 字符长度的变化有关:https://developer.twitter.com/en/docs/tweets/tweet-updates(在“兼容模式 JSON 渲染”中)。这意味着我需要检索 full_text 字段,而不是 text 字段。但是,这似乎不是由twitteR 提供的。

【问题讨论】:

  • 当你点击 SHORTEDURL 时,它会重定向到推文吗?
  • @HenryNavarro 你是对的,确实如此

标签: r string twitter url-shortener twitter-r


【解决方案1】:

twitteR package is in process of being deprecated。您应该改用rtweet

您可以下载rtweet from CRAN,但目前我建议从 Github 下载开发版。默认情况下,开发版将返回推文的全文。它还将返回转发或引用状态的完整原始文本。

要从 Github 安装最新版本的 rtweet,请使用 devtools 包。

## install newest version of rtweet
if (!requireNamespace("devtools", quietly = TRUE)) {
  install.packages("devtools")
}
devtools::install_github("mkearney/rtweet")

安装后,加载 rtweet 包。

## load rtweet
library(rtweet)

rtweet 有一个专用的package documentation website。它包括obtaining and using Twitter API access tokens 上的一个小插曲。如果您按照小插图中的步骤操作,则只需 [每台机器] 完成一次授权过程。

要搜索推文,请使用search_tweets() 函数。

# get sample tweet
rt <- search_tweets(
  "When I was driving around earlier this afternoon I only saw two Hunters",
  n = 500
)

打印输出(一个 tbl 数据框)。

> rt
# A tibble: 1 x 42
           status_id          created_at    user_id   screen_name
               <chr>              <dttm>      <chr>         <chr>
1 926943636641763328 2017-11-04 22:45:59 3652909394 _TooCrazyFox_
# ... with 38 more variables: text <chr>, source <chr>,
#   reply_to_status_id <chr>, reply_to_user_id <chr>,
#   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
#   favorite_count <int>, retweet_count <int>, hashtags <list>, symbols <list>,
#   urls_url <list>, urls_t.co <list>, urls_expanded_url <list>,
#   media_url <list>, media_t.co <list>, media_expanded_url <list>,
#   media_type <list>, ext_media_url <list>, ext_media_t.co <list>,
#   ext_media_expanded_url <list>, ext_media_type <lgl>,
#   mentions_user_id <list>, mentions_screen_name <list>, lang <chr>,
#   quoted_status_id <chr>, quoted_text <chr>, retweet_status_id <chr>,
#   retweet_text <chr>, place_url <chr>, place_name <chr>,
#   place_full_name <chr>, place_type <chr>, country <chr>, country_code <chr>,
#   geo_coords <list>, coords_coords <list>, bbox_coords <list>

打印推文文本(全文)。

> rt$text
[1] "When I was driving around earlier this afternoon I only saw two Hunters but it was during the midday break. I didn't have my camera otherwise I would have taken some photos of the standing corn fields in the snow. I'll do it later., maybe tomorrow.\n#harvest17"

要按 ID 查找 Twitter 状态,请使用 lookup_statuses() 函数。

## lookup tweet
tweet <- lookup_statuses("926943636641763328")

打印推文文本。

> tweet$text
[1] "When I was driving around earlier this afternoon I only saw two Hunters but it was during the midday break. I didn't have my camera otherwise I would have taken some photos of the standing corn fields in the snow. I'll do it later., maybe tomorrow.\n#harvest17"

【讨论】:

  • 感谢新的改进包!一直在对其进行测试,将我现有的代码转换为rtweets 似乎很简单。只是为了确认一下,没有与sinceuntil 等效的允许搜索特定日期范围的方法吗?在文档/插图中没有看到任何内容,但我想我会仔细检查。
  • 您可以在实际查询中包含这些参数。例如,sarch_tweets("driving around earlier since:2017-11-01 until:2017-11-06", n = 100)。我最近一直在更新很多文档。我可能应该举一些这样的例子:)。
猜你喜欢
  • 2017-04-10
  • 2023-03-29
  • 2016-03-02
  • 2018-02-20
  • 1970-01-01
  • 2017-07-15
  • 2020-08-19
  • 1970-01-01
  • 2018-09-23
相关资源
最近更新 更多