python 美丽的汤导入网址答案

【问题标题】：python beautiful soup import urlspython 美丽的汤导入网址
【发布时间】：2016-03-01 02:58:36
【问题描述】：

我正在尝试导入 url 列表并获取 pn2 和 main1。我可以在不导入文件的情况下运行它，所以我知道它可以工作，但我只是不知道如何处理导入。这是我最近尝试过的，下面是一小部分网址。提前致谢。

import urllib
import urllib.request
import csv
from bs4 import BeautifulSoup

csvfile = open("ecco1.csv")
csvfilelist = csvfile.read()
theurl="csvfilelist"

soup = BeautifulSoup(theurl,"html.parser")
for row in csvfilelist:

    for pn in soup.findAll('td',{"class":"productText"}):
        pn2.append(pn.text)
    for main in soup.find_all('div',{"class":"breadcrumb"}):
        main1 = main.text

        print (main1)
        print ('\n'.join(pn2))

网址： http://www.eccolink.com/products/productresults.aspx?catId=2458 http://www.eccolink.com/products/productresults.aspx?catId=2464 http://www.eccolink.com/products/productresults.aspx?catId=2435 http://www.eccolink.com/products/productresults.aspx?catId=2446 http://www.eccolink.com/products/productresults.aspx?catId=2463

【问题讨论】：

您遇到了什么问题？也许你想要csvfile.readlines()
我没有收到错误，但没有结果
试过 csvfile.readlines() 还是没有结果
for row in csvfilelist 是什么意思？迭代器变量row 没有出现在下方
老实说，我是从本网站上类似问题的答案中复制的

标签： python python-3.x web-scraping beautifulsoup

【解决方案1】：

据我所知，您正在打开一个 CSV 文件并使用 BeautifulSoup 对其进行解析。那不应该是这样。 BeautifulSoup 解析 html 文件，而不是 CSV。

查看您的代码，如果您将 html 代码传递给 Bs4，这似乎是正确的。

from bs4 import BeautifulSoup
import requests
links = []
file = open('links.txt')
html = requests.get('http://www.example.com')
soup = BeautifulSoup(html, 'html.parser')
for x in soup.find_all('a',"class":"abc"):
      links.append(x)
      file.write(x)
file.close()

以上是我如何在 html 代码中获取目标元素并将其写入文件/或将其附加到列表的一个非常基本的实现。使用请求而不是 urllib。这是一个更好的图书馆，更现代。

如果您想以 CSV 格式输入数据，我最好的选择是使用 csv 阅读器作为导入。

希望对您有所帮助。

【讨论】：