【发布时间】:2020-06-17 14:48:08
【问题描述】:
我的目标是从这个网站的表格中提取数据:https://www.coteur.com/match/cotes-start-stromsgodset-rid1106841.html
数据存储在 tr balise 中,在导入所有 tr balise 后,感谢 xpath 我检查了前 3 行的元素数,但它是空的。如果我的代码没问题,我应该有 [6 6 6]
这是我的代码:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
from selenium import webdriver
from bs4 import BeautifulSoup
import requests
import lxml.html as lh
import pandas as pd
url = 'https://www.coteur.com/match/cotes-start-stromsgodset-rid1106841.html'
#Create a handle , page, to handle the contents of the first soccer game
page = requests.get(url)
#Store the contents of the website under doc
doc = lh.fromstring(page.content)
#Parse data that are stored between <tr>..</tr> of HTML
tr_elements = doc.xpath('//tr')
#Check the length of the first 3 rows
a = [len(T) for T in tr_elements[:3]]
print(a)
这是输出:
hao@hao-ThinkPad-T420:~$ ./extractodds.py
[]
【问题讨论】: