从 URL Python 中提取特定文本答案

【问题标题】：Extract specific text from URL Python从 URL Python 中提取特定文本
【发布时间】：2018-01-22 03:40:39
【问题描述】：

我正在尝试从返回的许多 url 中提取特定文本。我将 Python 2.7 与请求和 BeautifulSoup 一起使用。

原因是我需要找到最新的 URL，该 URL 可以由最高数字“DF_7”识别，其中 7 是以下 url 中最高的。然后将下载此 url。请注意，每天都会添加新文件，这就是为什么我需要检查编号最高的文件。

一旦我在 URL 列表中找到最高数字，我就需要加入这个“https://service.rl360.com/scripts/customer.cgi/SC/servicing/”到最高数字的网址。最终产品应该是这样的。 https://service.rl360.com/scripts/customer.cgi/SC/servicing/downloads.php?Reference=DF_7&SortField=ExpiryDays&SortOrder=Ascending

网址看起来像这样，只是 DF_ 每次递增

这是正确的方法吗？如果是这样，我该怎么做。

谢谢

import base
import requests
import zipfile, StringIO, re
from lxml import html
from bs4 import BeautifulSoup

from base import os

from django.conf import settings

# Fill in your details here to be posted to the login form.
payload = {
    'USERNAME': 'xxxxxx',
    'PASSWORD': 'xxxxxx',
    'option': 'login'
}

headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5)     AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}

# Use 'with' to ensure the session context is closed after use.

with requests.Session() as s:
        p = s.post('https://service.rl360.com/scripts/customer.cgi?option=login', data=payload)

    # An authorised request.
    r = s.get('https://service.rl360.com/scripts/customer.cgi/SC/servicing/downloads.php?Folder=DataDownloads&SortField=ExpiryDays&SortOrder=Ascending', stream=True)
    content = r.text
    soup = BeautifulSoup(content, 'lxml')
    table = soup.find('table')
    links = table.find_all('a')
    print links

【问题讨论】：

你有这方面的代码吗？
是的，我现在修改我的帖子
能否添加脚本打印的链接？
您想要的链接是否总是页面上的最后一个链接，该类为 tabletd？

标签： python python-2.7 beautifulsoup python-requests

【解决方案1】：

你可以直接进入带有“tableid”类的最后一个链接，并像这样打印它的href值：

href = soup.find_all("a", {'class':'tabletd'})[-1]['href']
base = "https://service.rl360.com/scripts/customer.cgi/SC/servicing/"
print (base + href)

【讨论】：