【问题标题】:Python and Web Scraping - CSV Output IssuesPython 和 Web 抓取 - CSV 输出问题
【发布时间】:2019-05-22 00:41:41
【问题描述】:

我目前正在尝试运行 Python 脚本以从 Yahoo Fantasy Football 网站提取一些数据。我已经能够成功地抓取数据,但是遇到了 CSV 输出的问题。所有数据都被放入一列而不是多个不同的列。以下是我正在使用的代码:

import re, time, csv
import requests
from bs4 import BeautifulSoup

#Variables
League_ID = 1459285
Week_Number = 1
Start_Week = 1
End_Week = 13
Team_Name = "Test"
Outfile = 'Team_Stats.csv'
Fields = ['Player Name', 'Player Points', 'Player Team', 'Week']


with open('Team_Stats.csv','w') as Team_Stats:
        f = csv.writer(Team_Stats, Fields, delimiter=',', lineterminator='\n')
        f.writerow(Fields)


    for Week_Number in range(Start_Week, End_Week + 1):
            url = requests.get("https://football.fantasysports.yahoo.com/f1/" + str(League_ID) + "/2/team?&week=" + str(Week_Number))
            soup = BeautifulSoup(url.text, "html.parser")
            #print("Player Stats for " + Team_Name + " for Week " + str(Week_Number))

            player_name=soup.find_all('div',{'class':'ysf-player-name'})
            player_points=soup.find_all('a',{'class':'pps Fw-b has-stat-note '})

            for player_name in player_name:
                    player_name = player_name.contents[0]
                    #print(div.text)
                    f.writerow(player_name)

            for player_points in player_points:
                    #print(div.text)
                    Week_Number += 1
                    f.writerow(player_points)

    Team_Stats.flush()
    Team_Stats.close()
    print("Process Complete")

我还想在代码中留出一些空间来添加更多“For 循环”,因为我还有其他要收集的数据。

如果有人可以提出更好的方法来构建我的代码,请随时提供帮助!

这是我在 csv 文件中得到的示例输出

谢谢

【问题讨论】:

  • 您似乎已经知道writerow 将列表作为参数。您有 2 个for 循环串行写入文件。在嵌套列表中收集结果并然后写入一行。

标签: python web web-scraping screen-scraping


【解决方案1】:
import re, time, csv
import requests
from bs4 import BeautifulSoup

#Variables
League_ID = 1459285
Week_Number = 1
Start_Week = 1
End_Week = 13
Team_Name = "Test"
Outfile = 'Team_Stats.csv'
Fields = ['Player Name', 'Player Points', 'Player Team', 'Week']


with open('Team_Stats.csv','w') as Team_Stats:
    f = csv.writer(Team_Stats, Fields, delimiter=',', lineterminator='\n')
    f.writerow(Fields)


    for Week_Number in range(Start_Week, End_Week + 1):
        row = []
        url = requests.get("https://football.fantasysports.yahoo.com/f1/" + str(League_ID) + "/2/team?&week=" + str(Week_Number))
        soup = BeautifulSoup(url.text, "html.parser")
            #print("Player Stats for " + Team_Name + " for Week " + str(Week_Number))

        player_name=soup.find_all('a',{'class':'Nowrap name F-link'})
        player_points=soup.find_all('a',{'class':'pps Fw-b has-stat-note '})

        for pn, pp in zip(player_name, player_points):
                player_name = pn.contents[0]
                player_points = pp.contents[0]
                f.writerow([player_name, player_points])

    Team_Stats.flush()
    Team_Stats.close()
    print("Process Complete")

1) 抓取 player_name 的错误类

2) 我使用zip() 一次遍历两个列表,构造一个包含名称和点的行

【讨论】:

  • 我尝试了这个解决方案,但仍然没有给我正确的输出。如果您能看到我发布的图片,我只是想将数字移到右侧的列中,依此类推...
  • 我测试了这段代码,发现它完全错误。您至少必须弄清楚正确的抓取以将正确的行构造为列表。 CSV 输出没有问题,输入有问题(抓取的数据)
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2020-01-08
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2022-01-24
  • 1970-01-01
相关资源
最近更新 更多