【问题标题】:Python Pandas rows values convert to columns valuesPython Pandas 行值转换为列值
【发布时间】:2017-10-25 13:05:32
【问题描述】:

我使用 Python pandas 像这样读取数据帧:

<style type="text/css">
	table.tableizer-table {
		font-size: 12px;
		border: 1px solid #CCC; 
		font-family: Arial, Helvetica, sans-serif;
	} 
	.tableizer-table td {
		padding: 4px;
		margin: 3px;
		border: 1px solid #CCC;
	}
	.tableizer-table th {
		background-color: #104E8B; 
		color: #FFF;
		font-weight: bold;
	}
</style>
<table class="tableizer-table">
<thead><tr class="tableizer-firstrow"><th>Time</th><th>Angle</th><th>Angle</th><th>Angle</th><th>Angle</th><th>FUEL_1</th><th>FUEL_2</th><th>Speed</th></tr></thead><tbody>
 <tr><td>3:06:38</td><td>5.3</td><td>5.3</td><td>5.3</td><td>5.3</td><td>1150</td><td>&nbsp;</td><td>1328</td></tr>
 <tr><td>3:06:39</td><td>5.3</td><td>5.3</td><td>5.3</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>1328</td></tr>
 <tr><td>3:06:40</td><td>5.3</td><td>5.3</td><td>5.3</td><td>5.3</td><td>&nbsp;</td><td>1150</td><td>1344</td></tr>
 <tr><td>3:06:41</td><td>5.3</td><td>5.6</td><td>5.6</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>1392</td></tr>
 <tr><td>3:06:42</td><td>5.6</td><td>5.6</td><td>5.6</td><td>5.6</td><td>1160</td><td>&nbsp;</td><td>1456</td></tr>
 <tr><td>3:06:43</td><td>5.6</td><td>5.6</td><td>6</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>1520</td></tr>
 <tr><td>3:06:44</td><td>6</td><td>6</td><td>6</td><td>6</td><td>&nbsp;</td><td>1160</td><td>1600</td></tr>
 <tr><td>3:06:45</td><td>6</td><td>6</td><td>6</td><td>6.3</td><td>&nbsp;</td><td>&nbsp;</td><td>1696</td></tr>
</tbody></table>

我想创建以下数据框:

<style type="text/css">
	table.tableizer-table {
		font-size: 12px;
		border: 1px solid #CCC; 
		font-family: Arial, Helvetica, sans-serif;
	} 
	.tableizer-table td {
		padding: 4px;
		margin: 3px;
		border: 1px solid #CCC;
	}
	.tableizer-table th {
		background-color: #104E8B; 
		color: #FFF;
		font-weight: bold;
	}
</style>
<table class="tableizer-table">
<thead><tr class="tableizer-firstrow"><th>Time</th><th>Angle</th><th>FUEL_1</th><th>FUEL_2</th><th>Speed</th></tr></thead><tbody>
 <tr><td>3:06:38</td><td>5.3</td><td>1150</td><td>&nbsp;</td><td>1328</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>3:06:39</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>1328</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>3:06:40</td><td>5.3</td><td>&nbsp;</td><td>1150</td><td>1344</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>3:06:41</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>1392</td></tr>
 <tr><td>&nbsp;</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>3:06:42</td><td>5.6</td><td>1160</td><td>&nbsp;</td><td>1456</td></tr>
 <tr><td>&nbsp;</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>3:06:43</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>1520</td></tr>
 <tr><td>&nbsp;</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>3:06:44</td><td>6</td><td>&nbsp;</td><td>1160</td><td>1600</td></tr>
 <tr><td>&nbsp;</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>3:06:45</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>1696</td></tr>
 <tr><td>&nbsp;</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>6.3</td><td>&nbsp;</td><td>&nbsp;</td><td></td></tr>
</tbody></table>

我的想法是按 'Time','FUEL_1','FUEL_2','Speed' 插入几个空列,然后将这些列一一堆叠,然后合并它们。你有更简单的想法吗?

【问题讨论】:

    标签: python pandas stack reshape lreshape


    【解决方案1】:

    所以我很确定使用pandas.read_html 会很容易做到这一点,但我对BeautifulSoup 并不熟悉。

    html = """<table class="tableizer-table">
    <thead><tr class="tableizer-firstrow"><th>Time</th><th>Angle</th><th>Angle</th><th>Angle</th><th>Angle</th><th>FUEL_1</th><th>FUEL_2</th><th>Speed</th></tr></thead><tbody>
     <tr><td>3:06:38</td><td>5.3</td><td>5.3</td><td>5.3</td><td>5.3</td><td>1150</td><td>&nbsp;</td><td>1328</td></tr>
     <tr><td>3:06:39</td><td>5.3</td><td>5.3</td><td>5.3</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>1328</td></tr>
     <tr><td>3:06:40</td><td>5.3</td><td>5.3</td><td>5.3</td><td>5.3</td><td>&nbsp;</td><td>1150</td><td>1344</td></tr>
     <tr><td>3:06:41</td><td>5.3</td><td>5.6</td><td>5.6</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>1392</td></tr>
     <tr><td>3:06:42</td><td>5.6</td><td>5.6</td><td>5.6</td><td>5.6</td><td>1160</td><td>&nbsp;</td><td>1456</td></tr>
     <tr><td>3:06:43</td><td>5.6</td><td>5.6</td><td>6</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>1520</td></tr>
     <tr><td>3:06:44</td><td>6</td><td>6</td><td>6</td><td>6</td><td>&nbsp;</td><td>1160</td><td>1600</td></tr>
     <tr><td>3:06:45</td><td>6</td><td>6</td><td>6</td><td>6.3</td><td>&nbsp;</td><td>&nbsp;</td><td>1696</td></tr>
    </tbody></table>"""
    
    import pandas as pd
    from bs4 import BeautifulSoup
    
    def read_table(html):
      header, matrix = [], []
      bs = BeautifulSoup(html, "html.parser")
      for row in bs.findAll("tr"):
        if(row.find("th")):
          header = [ r.get_text().strip() for r in row.findAll("th") ]
        else: #td
          matrix.append([ r.get_text().strip() for r in row.findAll("td") ])
    
      df = pd.DataFrame(matrix, columns=header)
      return df
    

    将您提供的 html 传递给此函数将返回熊猫的数据框,然后您可以选择所需的列。

    df = read_table(html)
    df[["Time","FUEL_1","FUEL_2","Speed"]]
          Time FUEL_1 FUEL_2 Speed
    0  3:06:38   1150         1328
    1  3:06:39                1328
    2  3:06:40          1150  1344
    3  3:06:41                1392
    4  3:06:42   1160         1456
    5  3:06:43                1520
    6  3:06:44          1160  1600
    7  3:06:45                1696
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2021-05-26
      • 2020-04-03
      • 2021-10-13
      • 2018-12-19
      • 1970-01-01
      • 2013-08-03
      相关资源
      最近更新 更多