【问题标题】:click a link on a webpage resulting from submitting a form单击提交表单后网页上的链接
【发布时间】:2017-02-16 22:38:59
【问题描述】:

在下面的代码中,我填写了一个表格,然后将其提交到网站上。然后我抓取结果数据,然后将其写入 csv 文件(所有这些都很好)。但是该结果页面上有一个带有文本'稍后'的内容,请问我该如何单击此链接。我在用。我检查了一个类似的问题:this 但它并没有完全回答我的问题。

 # import needed libraries
from mechanize import Browser
from datetime import datetime
from bs4 import BeautifulSoup
import csv

br = Browser()

# Ignore robots.txt
br.set_handle_robots(False)

# Google demands a user-agent that isn't a robot
br.addheaders = [('User-agent', 'Chrome')]

# Retrieve the Google home page, saving the response
br.open('http://fahrplan.sbb.ch/bin/query.exe/en')


# Enter the text input (This section should be automated to read multiple text input as shown in the question)
br.select_form(nr=6)

br.form["REQ0JourneyStopsS0G"] = 'Eisenstadt'  # Origin train station (From)
br.form["REQ0JourneyStopsZ0G"] ='sarajevo'  # Destination train station (To)
br.form["REQ0JourneyTime"] = '5:30'  # Search Time
br.form["date"] = '18.01.17'  # Search Date

# Get the search results
br.submit()

# get the response from mechanize Browser
soup = BeautifulSoup(br.response().read(), 'html.parser', from_encoding="utf-8")
trs = soup.select('table.hfs_overview tr')

# scrape the contents of the table to csv (This is not complete as I cannot write the duration column to the csv)
with open('out.csv', 'w') as f:
    for tr in trs:

        locations = tr.select('td.location')
        if len(locations) > 0:
            location = locations[0].contents[0].strip()
            prefix = tr.select('td.prefix')[0].contents[0].strip()
            time = tr.select('td.time')[0].contents[0].strip()
            #print tr.select('td.duration').contents[0].strip()
            durations = tr.select('td.duration')
            #print durations
            if len(durations) == 0:
                duration = ''
                #print("oops! There aren't any durations.")
            else:
                duration = durations[0].contents[0].strip()
            f.write("{},{},{}, {}\n".format(location.encode('utf-8'), prefix, time, duration))

【问题讨论】:

    标签: python beautifulsoup


    【解决方案1】:

    带有Later 链接的 HTML 看起来像

    <a accesskey="l" class="hafas-browse-link" href="http://fahrplan.sbb.ch/bin/query.exe/en?ld=std2.a&amp;seqnr=1&amp;ident=kv.047469247.1487285405&amp;REQ0HafasScrollDir=1" id="hfs_linkLater" title="Search for later connections">Later</a>
    

    您可以使用以下方式找到网址:

    In [22]: soup.find('a', text='Later')['href']
    Out[22]: u'http://fahrplan.sbb.ch/bin/query.exe/en?ld=std2.a&seqnr=1&ident=kv.047469247.1487285405&REQ0HafasScrollDir=1'
    

    要让浏览器转到该链接,请调用br.open

    In [21]: br.open(soup.find('a', text='Later')['href'])
    Out[21]: <response_seek_wrapper at 0x7f346a5da320 whose wrapped object = <closeable_response at 0x7f3469bee830 whose fp = <socket._fileobject object at 0x7f34697f26d0>>>
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-08-28
      • 2015-02-14
      相关资源
      最近更新 更多