【问题标题】:Using HtmlAgilityPack in VB.Net to Get Text from a Website在 VB.Net 中使用 HtmlAgilityPack 从网站获取文本
【发布时间】:2015-07-21 01:41:20
【问题描述】:

我正在为我的女朋友编写一个程序,它允许她打开程序,它会自动从星座网站收集她的一句话,并在 TextBox 中显示该行文本。

就我现在所拥有的而言,它基本上以 HTML 显示整个网站,这不是我想要的。这是我需要抓取的 HTML 行。

<div class="fontdef1" style="padding-right:10px;" id="textline">
"You might have the desire for travel, perhaps to visit a friend who lives far away, Gemini. You may actually set the wheels in motion to make it happen. Social events could take up your time this evening, and you could meet some interesting people. A friend might need a sympathetic ear. Today you're especially sensitive to others, so be prepared to hear a sad story. Otherwise, your day should go well. 
</div>

我目前的代码是。

Imports System.Net
Imports System.IO
Imports HtmlAgilityPack

Public Class Form1

    Private Function getHTML(ByVal Address As String) As String
        Dim rt As String = ""

        Dim wRequest As WebRequest
        Dim wResponse As WebResponse

        Dim SR As StreamReader

        wRequest = WebRequest.Create(Address)
        wResponse = wRequest.GetResponse

        SR = New StreamReader(wResponse.GetResponseStream)

        rt = SR.ReadToEnd
        SR.Close()

        Return rt
    End Function

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        Label2.Text = Date.Now.ToString("MM/dd/yyyy")
        TextBox1.Text = getHTML("http://my.horoscope.com/astrology/free-daily-horoscope-gemini.html")
    End Sub
End Class

感谢您为我提供的任何帮助。老实说,我现在不知道该程序在哪里。已经3天了,没有任何进展。

【问题讨论】:

    标签: vb.net html-agility-pack


    【解决方案1】:

    学习 XPathLINQ 使用 HtmlAgilityPack 从 HTML 文档中提取某些信息。这是一个使用 XPath 选择器的控制台应用程序示例:

    Imports System
    Imports System.Xml
    Imports HtmlAgilityPack
    
    Public Module Module1
        Public Sub Main()
            Dim link As String = "http://my.horoscope.com/astrology/free-daily-horoscope-gemini.html"
            'download page from the link into an HtmlDocument'
            Dim doc As HtmlDocument = New HtmlWeb().Load(link)
            'select <div> having class attribute equals fontdef1'
            Dim div As HtmlNode = doc.DocumentNode.SelectSingleNode("//div[@class='fontdef1']")
            'if the div is found, print the inner text'
            If Not div Is Nothing Then
                Console.WriteLine(div.InnerText.Trim())
            End If
        End Sub
    End Module
    

    Dotnetfiddle Demo

    输出:

    你可能有旅行的愿望,也许是去拜访一位住在远方的朋友,双子座。您实际上可能会启动轮子以使其发生。社交活动可能会占用你今晚的时间,你可能会遇到一些有趣的人。朋友可能需要一个富有同情心的耳朵。今天你对别人特别敏感,所以要准备好听一个悲伤的故事。否则,您的一天应该会很顺利。

    【讨论】:

    • 好吧,我听取了您的建议,并尝试将您的从控制台转换为 IO,以便输出到我的 TextBox1。我将Console.WriteLine(div.InnerText.Trim()) 更改为Dim finish As String = (div.InnerText.Trim()) 问题是现在,我不知道如何将字符串“完成”拉入我的私有子以与TextBox1 一起使用。我曾尝试调用 Main,认为 String 会随之而来,但我想情况并非如此。有什么建议吗?谢谢
    • @RockGuitarist1 TextBox1.Text = finish ?
    • 我收到错误消息,指出未声明 finish。好像我无法将它从 Public Sub Main() 移动到 Private Sub Form1_Load
    • @RockGuitarist1 你不需要修改你的Public Sub Main(),只需在Private Sub Form1_Load 中实现这个答案中显示的所有逻辑
    • 你是最棒的!非常感谢!
    猜你喜欢
    • 1970-01-01
    • 2012-08-13
    • 1970-01-01
    • 1970-01-01
    • 2015-11-15
    • 1970-01-01
    • 1970-01-01
    • 2017-03-22
    • 1970-01-01
    相关资源
    最近更新 更多