【发布时间】:2016-01-05 05:04:43
【问题描述】:
library(httr)
library(RCurl)
library(XML)
raw <- GET("http://hse.ru/org/persons",
query = list(udept = "135083"))
raw_text <- content(raw, "text", encoding = "UTF-8")
parsed <- htmlParse(raw_text)
interests <- xpathSApply(parsed,
'/html/body/div/div/div/div[2]/div/div/div/div/div/div/div/div',
xmlValue)
interests[31:32]
[1] "прикладная эконометрикаЭмпирический анализ рынков(…)"
[2] "современная теория правасоциология права"
对于那些不熟悉俄语的人。最后两行很难解释为:
interests[31:32]
[1] "applied econometricsEmpirical analysis of markets"
[2] "modern legal theorysociology of law"
我正在应用xmlValue 的对象具有这种结构(已翻译):
[[1]]
<div class="with-indent small">
<a class="tag" href="/org/persons/?intst=62024792">applied econometrics</a>
<a class="tag" href="/org/persons/?intst=62247389">empirical analysis of markets</a>
(...)
</div>
[[2]]
<div class="with-indent small">
<a class="tag" href="/org/persons/?intst=132077027">modern legal theory</a>
(...)
<a class="tag" href="/org/persons/?intst=52953762">sociology of law</a>
</div>
我想知道如何添加空格(或"; ")作为值之间的分隔符并获得以下信息:
interests[31:32]
[1] "applied econometrics Empirical analysis of markets"
[2] "modern legal theory sociology of law"
我尝试将xmlValue 与paste (xpathSApply(parsed, 'pattern', paste(xmlValue, " "))) 一起使用并收到错误。
【问题讨论】:
-
你能分享一下错误吗?
-
@Ouistiti
Error in paste(xmlValue, " ") : cannot coerce type 'closure' to vector of type 'character'
标签: xml r xml-parsing whitespace space