【问题标题】:Export data from BeautifulSoup to CSV将数据从 BeautifulSoup 导出到 CSV
【发布时间】:2018-02-12 01:30:43
【问题描述】:

[免责声明] 我已经阅读了该领域的许多其他答案,但它们似乎对我不起作用。

我希望能够将我抓取的数据导出为 CSV 文件。

我的问题是如何编写将数据输出到 CSV 的代码?

当前代码

import requests
from bs4 import BeautifulSoup 

url = "http://implementconsultinggroup.com/career/#/6257"
r = requests.get(url)

req = requests.get(url).text
soup = BeautifulSoup(r.content)
links = soup.find_all("a")

for link in links:
     if "career" in link.get("href") and 'COPENHAGEN' in link.text:
             print "<a href='%s'>%s</a>" %(link.get("href"), link.text)

代码输出

View Position

</a>
<a href='/career/management-consultants-to-help-our-customers-succeed-with-
it/'>
Management consultants to help our customers succeed with IT
COPENHAGEN • At Implement Consulting Group, we wish to make a difference in 
the consulting industry, because we believe that the ability to create Change 
with Impact is a precondition for success in an increasingly global and 
turbulent world.




View Position

</a>
<a href='/career/management-consultants-within-process-improvement/'>
Management consultants within process improvement
COPENHAGEN • We are looking for consultants with profound
experience in Six Sigma, Lean and operational
management

我尝试过的代码

with open('ImplementTest1.csv',"w") as csv_file:
     writer = csv.writer(csv_file)
     writer.writerow(["link.get", "link.text"])
     csv_file.close()

以 CSV 格式输出

第 1 列:网址链接

第 2 栏:职位描述

例如

第 1 列:/career/management-consultants-to-help-our-customers-succeed-with- 它/

第 2 栏:管理顾问帮助我们的客户在 IT 方面取得成功 哥本哈根 • 在实施咨询集团,我们希望在以下方面有所作为 咨询行业,因为我们相信创造变革的能力 具有影响力是在日益全球化的环境中取得成功的先决条件 动荡的世界。

【问题讨论】:

  • 您必须将结果存储在列表中。
  • 谢谢亚当。我对 Python 很陌生,你能快速展示如何将结果创建/存储为列表吗?
  • 这是我对类似问题的回答:extract-data-from-html-to-csv-using-beautifulsoup
  • 所以我只需要在这件作品中添加? tables = soup.find_all('table') data = [] for table in tables: previous = table.find_previous_siblings('h2') id = previous[0].get('id') if previous else None rows = [td .get_text(strip=True) for td in table.find_all('td')] data.append([id] + rows)
  • 或者你写的代码的哪些部分与我的情况相关?

标签: python-2.7 parsing web-scraping beautifulsoup export-to-csv


【解决方案1】:

试试这个脚本并获取 csv 输出:

import csv ; import requests
from bs4 import BeautifulSoup 

outfile = open('career.csv','w', newline='')
writer = csv.writer(outfile)
writer.writerow(["job_link", "job_desc"])

res = requests.get("http://implementconsultinggroup.com/career/#/6257").text
soup = BeautifulSoup(res,"lxml")
links = soup.find_all("a")

for link in links:
     if "career" in link.get("href") and 'COPENHAGEN' in link.text:
        item_link = link.get("href").strip()
        item_text = link.text.replace("View Position","").strip()
        writer.writerow([item_link, item_text])
        print(item_link, item_text)
outfile.close()

【讨论】:

  • 感谢 Shahin - 这完全符合我的要求。唯一没有在最后一块工作的功能:outfile.close(): File "", line 7 outfile.close() ^ SyntaxError: invalid syntax
  • 这是因为我想,你使用 python 2 而我使用 python 3。不过我不确定!但是,它最终运行完美。
猜你喜欢
  • 2017-08-12
  • 2017-02-20
  • 2021-03-23
  • 2014-07-26
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2017-06-18
  • 2021-04-04
相关资源
最近更新 更多