【发布时间】:2018-04-03 04:37:39
【问题描述】:
是否可以使用 rvest 包读取存储在 input type="radio" 标记中并后跟 TAG span class="glyphicon glyphicon-ok" 的文本。例如:我想在字符向量中读取“碳水化合物和脂肪”
R 代码#does not work and give NA is stored in p_ans
install.packages('rvest')
library('rvest')
url <- 'http://upscfever.com/upsc-fever/en/test/en-test-sci1.html'
webpage <- read_html(url)
p_ans <- webpage %>%
html_nodes("input + glyphicon-ok") %>%
html_text()
HTML 代码
<div class="form-group" id="myform">
<label for="usr">Q1: Energy giving foods are </label>
</div>
<div class="radio">
<label><input type="radio" value="1" name="optradio0">Carbohydrates and fats<span class="glyphicon glyphicon-ok"></span></label>
</div>
<div class="radio">
<label><input type="radio" id="opt1" value="-0.33" name="optradio0">Carbohydrates and Proteins<span id="sp1" class="glyphicon glyphicon-remove"></span></label>
</div>
【问题讨论】:
-
非常聪明的方法来捕捉特定测验的所有正确答案 ;-)
标签: r web-scraping rvest