【发布时间】:2021-01-27 11:47:31
【问题描述】:
我有一个我想用 pandas 阅读的 HTML,问题是 HTML 不是表格,尽管在网站上它看起来像一个,但我有这样的:
table = '''
<div id="companyResults">
<div class="col-md-12 titles">
<div class="col-md-6"> </div>
<div class="col-md-4">LOCATION</div>
<div class="col-md-2 last">SALES REVENUE ($M)</div>
</div>
<div class="col-md-12 data">
<div class="col-md-6">
<a href="/business-directory/company-profiles.shenzhen_zhaoji_optical_co_ltd.bcf9d7eb4856eb739ec66272a6d9a361.html">
Shenzhen Zhaoji Optical Co., Ltd.</a>
</div>
<div class="col-md-4">
<div class="show-mobile">Country:</div>
Shenzhen,
Guangdong,
<br/>
China</div>
<div class="col-md-2 last">
<div class="show-mobile">Sales Revenue ($M):</div>
</div>
</div>
<div class="col-md-12 data">
<div class="col-md-6">
<a href="/business-directory/company-profiles.foxconn_industrial_internet_co_ltd.0d4c40a311dbfb1169684a21caa8794c.html">
Foxconn Industrial Internet Co., Ltd.</a>
</div>
<div class="col-md-4">
<div class="show-mobile">Country:</div>
Shenzhen,
Guangdong,
<br/>
China</div>
<div class="col-md-2 last">
<div class="show-mobile">Sales Revenue ($M):</div>
$40,833.44M</div>
</div>
<div class="col-md-12 data">
<div class="col-md-6">
<a href="/business-directory/company-profiles.boe_technology_group_co_ltd.61b87aa6bc863b69d8d7689703a3ac52.html">
BOE Technology Group Co., Ltd.</a>
</div>
<div class="col-md-4">
<div class="show-mobile">Country:</div>
Beijing,
Beijing,
<br/>
China</div>
<div class="col-md-2 last">
<div class="show-mobile">Sales Revenue ($M):</div>
$16,495.55M</div>
</div>
<div class="col-md-12 data">
<div class="col-md-6">
<a href="/business-directory/company-profiles.futong_group_co_ltd.85c12cb0d89005d1280cd3c0c13879ff.html">
Futong Group Co., Ltd.</a>
</div>
<div class="col-md-4">
<div class="show-mobile">Country:</div>
Hangzhou,
Zhejiang,
<br/>
China</div>
<div class="col-md-2 last">
<div class="show-mobile">Sales Revenue ($M):</div>
</div>
</div>
<div class="col-md-12 data">
<div class="col-md-6">
<a href="/business-directory/company-profiles.ofilm_group_co_ltd.515f10b35d850547d16fb6d6875a57d9.html">
OFILM Group Co., Ltd.</a>
</div>
<div class="col-md-4">
<div class="show-mobile">Country:</div>
Shenzhen,
Guangdong,
<br/>
China</div>
<div class="col-md-2 last">
<div class="show-mobile">Sales Revenue ($M):</div>
$5,355.25M</div>
</div>
'''
我想要一个如下所示的输出:
LOCATION \
0 Shenzhen Zhaoji Optical Co., Ltd. Shenzhen, Guangdong, China
1 Foxconn Industrial Internet Co., Ltd. Shenzhen, Guangdong, China
2 BOE Technology Group Co., Ltd. Beijing, Beijing, China
3 Futong Group Co., Ltd. Hangzhou, Zhejiang, China
4 OFILM Group Co., Ltd. Shenzhen, Guangdong, China
SALES REVENUE ($M)
0
1 $40,833.44M
2 $16,495.55M
3
4 $5,355.25M
我试过了:
pd.read_html(str(table))
但是得到了这个:
ValueError: No tables found
那么实现这一目标的最佳方法是什么? PS:建议在行中添加更多细节(如 href 或其他),但不是必须的
更新:url
【问题讨论】:
标签: python html beautifulsoup