【发布时间】:2021-05-04 08:43:59
【问题描述】:
我想通过text() 识别包含带有“Umlaute”文本的节点。
library(xml2)
library(rvest)
doc <- "<p>Über uns </p>" %>% xml2::read_html()
grepl(pattern = "Über uns", x = as.character(doc))
grepl(pattern = "Über uns", x = doc)
问题:
如何提取包含文本“Über uns”的节点?
尝试了什么:
https://forum.fhem.de/index.php?topic=96254.0
Java XPath umlaut/vowel parsing
# does not work
xp <- paste0("//*[contains(text(), 'Über uns')]")
html_nodes(x = doc, xpath = xp)
# does not work
xp <- paste0("//*[translate(text(), 'Ü', 'U') = 'Uber uns']")
html_nodes(x = doc, xpath = xp)
# does not work
xp <- paste0("//*[contains(text(), 'Über uns')]")
html_nodes(x = doc, xpath = xp)
# this works but i wonder if there is a solution with xpath
doc2 <- doc %>%
as.character() %>%
gsub(pattern = "Ü", replacement = "Ue") %>%
xml2::read_html()
xp <- paste0("//*[contains(text(), 'Ueber uns')]")
html_nodes(x = doc2, xpath = xp)
【问题讨论】: