【问题标题】:convert html table to csv using pandas python使用 pandas python 将 html 表转换为 csv
【发布时间】:2021-03-10 19:40:35
【问题描述】:

那是我的代码工作正常。

import pandas as pd
html_data = """<table id="example" class="table table-hover dataTable no-footer" role="grid" aria-describedby="example_info">
                            <thead>
                            <tr role="row"><th class="sorting_desc" tabindex="0" aria-controls="example" rowspan="1" colspan="1" aria-sort="descending" aria-label="Start Date/Time: activate to sort column ascending">Start Date/Time</th><th class="sorting" tabindex="0" aria-controls="example" rowspan="1" colspan="1" aria-label="End Date/Time: activate to sort column ascending">End Date/Time</th><th class="sorting" tabindex="0" aria-controls="example" rowspan="1" colspan="1" aria-label="Caller Name: activate to sort column ascending">Caller Name</th><th class="sorting" tabindex="0" aria-controls="example" rowspan="1" colspan="1" aria-label="Caller Number: activate to sort column ascending">Caller Number</th><th class="sorting" tabindex="0" aria-controls="example" rowspan="1" colspan="1" aria-label="Callee: activate to sort column ascending">Callee</th><th class="sorting" tabindex="0" aria-controls="example" rowspan="1" colspan="1" aria-label="Used Mins.: activate to sort column ascending">Used Mins.</th><th class="text-center sorting_disabled" rowspan="1" colspan="1" aria-label="File">File</th></tr>
                            </thead>
                            <tbody>
<tr role="row" class="odd"><td class="sorting_1">2020-11-27 12:50:23</td><td>2020-11-27 12:51:04</td><td>ABC 3</td><td>7111</td><td>923333222</td><td>1</td><td class=" text-center"><audio controls="">
        <source src="../record_files_out/3/2020/oc_1.wav.wav" type="audio/ogg">
        <source src="../record_files_out/358/2020-11-27/oc_1934553_358.wav.wav" type="audio/mpeg">
        Your browser does not support the audio element.
            </audio></td></tr></tbody>
                        </table>
"""
print(pd.read_html(html_data)[0].to_csv(index=False, header=True))

这是输出

2020-11-27 12:50:23,2020-11-27 12:51:04,ABC 3,7111,923333222,1,Your browser does not support the audio element.

但我想提取

../record_files_out/3/2020/oc_1.wav.wav

不是这个

Your browser does not support the audio element.

【问题讨论】:

    标签: python html web-scraping html-tableextract


    【解决方案1】:

    我会推荐你​​看看这个推荐的选项:

    # Importing the required modules  
    import os 
    import sys 
    import pandas as pd 
    from bs4 import BeautifulSoup 
       
    path = 'html.html'
       
    # empty list 
    data = [] 
       
    # for getting the header from 
    # the HTML file 
    list_header = [] 
    soup = BeautifulSoup(open(path),'html.parser') 
    header = soup.find_all("table")[0].find("tr") 
      
    for items in header: 
        try: 
            list_header.append(items.get_text()) 
        except: 
            continue
      
    # for getting the data  
    HTML_data = soup.find_all("table")[0].find_all("tr")[1:] 
      
    for element in HTML_data: 
        sub_data = [] 
        for sub_element in element: 
            try: 
                sub_data.append(sub_element.get_text()) 
            except: 
                continue
        data.append(sub_data) 
      
    # Storing the data into Pandas 
    # DataFrame  
    dataFrame = pd.DataFrame(data = data, columns = list_header) 
       
    # Converting Pandas DataFrame 
    # into CSV file 
    dataFrame.to_csv('Geeks.csv') 
    

    【讨论】:

    • 同样的问题,我也想提取“../record_files_out/3/2020/oc_1.wav.wav”
    猜你喜欢
    • 2016-10-03
    • 2019-07-27
    • 2017-09-12
    • 2019-03-15
    • 2021-12-26
    • 2019-07-05
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多