【问题标题】:How to append my web scraping data into a list?如何将我的网络抓取数据附加到列表中?
【发布时间】:2021-11-29 05:56:27
【问题描述】:

使用 Python: 我想为所有玩家的高度刮这个网站https://athletics.baruch.cuny.edu/sports/mens-swimming-and-diving/roster。这是我的代码。成功提取高度后,我无法将它们附加到列表中。当我尝试这样做时,列表执行为 ['6'3"', '6'3"', '6'3"'] 而不是我需要的 23 个高度。我做错了什么?

import requests
from bs4 import BeautifulSoup

url = 'https://athletics.baruch.cuny.edu/sports/mens-swimming-and-diving/roster'

page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')

#Creating empty lists to hold the heights in inches and original scraped heights
height_inches = []
height_list = []

heights = soup.find_all('span', class_= "sidearm-roster-player-height")

#For some reason, the 23 heights printed twice, hence -23 from the length
for i in range(0, (len(heights)-23)):
  {
      print(heights[i].get_text())
  }
#^This line of code allows me to see all the heights in a normal list^

#Trying to append the newly found heights in a list
height_list.append(heights[i].get_text())
print(height_list)

【问题讨论】:

  • list.append(x) 代码行应该在for 循环内。
  • for i in heights: height_list.append(i.get_text()) 另外,您不需要在 Python 中放置花括号。

标签: python list web-scraping append


【解决方案1】:

当使用for 循环构建列表时,您需要.append 发生在循环内部

import requests
from bs4 import BeautifulSoup

url = 'https://athletics.baruch.cuny.edu/sports/mens-swimming-and-diving/roster'

page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')

#Creating empty lists to hold the heights in inches and original scraped heights
height_inches = []
height_list = []

heights = soup.find_all('span', class_= "sidearm-roster-player-height")

#For some reason, the 23 heights printed twice, hence -23 from the length
for i in range(0, (len(heights)-23)):
    height_text = heights[i].get_text()
    height_list.append(height_text)
    print(height_text)

print(height_list)

【讨论】:

    猜你喜欢
    • 2020-09-12
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-09-12
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多