使用python从tr balise导入数据答案

【问题标题】：Import data from tr balise with python使用python从tr balise导入数据
【发布时间】：2020-06-17 14:48:08
【问题描述】：

我的目标是从这个网站的表格中提取数据：https://www.coteur.com/match/cotes-start-stromsgodset-rid1106841.html

数据存储在 tr balise 中，在导入所有 tr balise 后，感谢 xpath 我检查了前 3 行的元素数，但它是空的。如果我的代码没问题，我应该有 [6 6 6]

这是我的代码：

#!/usr/bin/python3
# -*- coding: utf-8 -*-

from selenium import webdriver
from bs4 import BeautifulSoup
import requests
import lxml.html as lh
import pandas as pd

url = 'https://www.coteur.com/match/cotes-start-stromsgodset-rid1106841.html'

#Create a handle , page, to handle the contents of the first soccer game
page = requests.get(url)

#Store the contents of the website under doc
doc = lh.fromstring(page.content)

#Parse data that are stored between <tr>..</tr> of HTML
tr_elements = doc.xpath('//tr')

#Check the length of the first 3 rows
a = [len(T) for T in tr_elements[:3]]
print(a)

这是输出：

hao@hao-ThinkPad-T420:~$ ./extractodds.py 
[]

【问题讨论】：

标签： python xpath extract tr

【解决方案1】：

您应该修复您的 XPath 表达式。您必须从表格中选择 tr 元素：

//table[@id="TableCoteHistory"]//tr[@class and @role]

输出：11 个长度为 6 的元素。

【讨论】：

【解决方案2】：

我试过你的方法：

#Parse data that are stored between <tr>..</tr> of HTML
tr_elements = doc.xpath('//table[@id="TableCoteHistory"]//tr[@class and @role]')

#Check the length of the first 5 rows
a = [len(T) for T in tr_elements[:5]]
print(a)

它对我没有任何改变，总是一个空的输出

【讨论】：