【问题标题】:Save array output to csv in Python [duplicate]在Python中将数组输出保存到csv [重复]
【发布时间】:2019-07-02 01:15:29
【问题描述】:

我正在尝试从网站上抓取数据。我正在循环提取数据并存储在变量中,但无法将其保存在 csv 文件中。作为 Python 和 BeautifulSoup 的新手,我并没有走得太远。代码如下:

import requests
from bs4 import BeautifulSoup
import csv

r = "https://sofia.businessrun.bg/en/results-2018/"
content = requests.get(r)

soup = BeautifulSoup(content.text, 'html.parser')


for i in range (1,5):
    team_name= soup.find_all(class_="column-3")
    team_time= soup.find_all(class_="column-5")


for i in range (1,5):
  print (team_name[i].text)
  print (team_time[i].text)

with open("new_file.csv","w+") as my_csv:
    csvWriter = csv.writer(my_csv,delimiter=',')
    csvWriter.writerows(team_name)

任何帮助将不胜感激!

【问题讨论】:

  • 运行时会发生什么?有错误吗?

标签: python arrays file csv save


【解决方案1】:

我找到了另一种方法来进行 scraping 并使用 pandas 将其保存在 csv 中。代码如下:

import requests

# I changed this
import pandas as pd

from bs4 import BeautifulSoup
import csv

r = "https://sofia.businessrun.bg/en/results-2018/"
content = requests.get(r)

soup = BeautifulSoup(content.text, 'html.parser')


for i in range (1,5):
    team_name= soup.find_all(class_="column-3")
    team_time= soup.find_all(class_="column-5")

tn_list = []
tt_list = []

# I changed this to have string in place of tags 
tn_list = [str(x) for x in team_name]
tt_list = [str(x) for x in team_time]
        
for i in range (1,5):
    print(team_name[i].text)
    print(team_time[i].text)

# I put the result in a dataframe
df = pd.DataFrame({"teamname" : tn_list, "teamtime" : tt_list})

# I use regex to clean your data (get rid of the html tags)
df.teamname = df.teamname.str.replace("<[^>]*>", "")
df.teamtime = df.teamtime.str.replace("<[^>]*>", "")

# The first row is actually the column name
df.columns = df.iloc[0]
df = df.iloc[1:]

# I send it to a csv
df.to_csv(r"path\to\new_file.csv")

这应该正常工作

【讨论】:

  • 效果很好!谢谢!
猜你喜欢
  • 1970-01-01
  • 2021-12-06
  • 2015-12-28
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2016-08-25
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多