【问题标题】:Parsing through Links stored in CSV file解析存储在 CSV 文件中的链接
【发布时间】:2019-06-07 09:11:14
【问题描述】:

我正在尝试解析存储在我的 csv 文件中的链接,然后为每个链接打印标题。当我尝试读取链接并进行解析以获取每个链接的标题时,我在代码底部遇到了一些问题。

import csv
from bs4 import BeautifulSoup
from urllib.request import urlopen

contents = []

filename = 'scrap.csv'

with open(filename,'rt') as f:
    data = csv.reader(f)

    for row  in data:
        links = row[0]
        contents.append(links) #add each url to list of contents

for links in contents: #parse through each url in the list contents
    url = urlopen(links[0].read())
    soup = BeautifulSoup(url,"html.parser")

for title in soup.find_all('title'):
    print(title)

我希望输出是每行打印的标题,但我收到以下错误 第 17 行,在 url = urlopen(链接[0].read()) AttributeError: 'str' 对象没有属性 'read'

【问题讨论】:

  • 你为什么要读一个字符串。它已经是您需要的网址了。
  • 很抱歉我是 Python 新手。你有什么建议@satyamsoni
  • 直接使用url = urlopen(links[0])

标签: python csv screen-scraping


【解决方案1】:

将 url = urlopen(links[0].read()) 更改为 url = urlopen(links).read()

【讨论】:

    【解决方案2】:

    试试这个代码。这应该可以工作并减少您的开销。

    import pandas as pd
    for link in pd.read_csv('scrap.csv')[0].values:
        url = urlopen(link)
        soup = BeautifulSoup(url,"html.parser")
    

    【讨论】:

      【解决方案3】:
      import csv
      from bs4 import BeautifulSoup
      from urllib.request import urlopen
      import requests
      
      contents = []
      
      def soup_title():
          for title in soup.find_all('title'):
              title_name = title
              return title_name
      
      filename = 'scrap.csv'
      
      with open(filename,'rt') as f:
          data = csv.reader(f)
      
          for row  in data:
              links = row[0]
              contents.append(links) #add each url to list of contents
      
      for links in contents: #parse through each url in the list contents
           url = requests.get(links)
           soup = BeautifulSoup(url.text,"html.parser")
           brand_info = soup_title()
           print(brand_info)
      

      【讨论】:

        猜你喜欢
        • 2017-09-26
        • 2015-10-13
        • 1970-01-01
        • 1970-01-01
        • 2016-07-10
        • 2016-07-20
        • 2015-01-13
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多